Reward Backpropagation Prioritized Experience Replay

نویسندگان

  • Yangxin Zhong
  • Borui Wang
  • Yuanfang Wang
چکیده

Sample efficiency is an important topic in reinforcement learning. With limited data and experience, how can we converge to a good policy more quickly? In this paper, we propose a new experience replay method called Reward Backpropagation, which gives higher minibatch sampling priority to those (s, a, r, s′) with r 6= 0 and then propagate the priority backward to its previous transition once it has been sampled and so on. Experiments show that DQN model combined with our method converges 1.5x faster than vanilla DQN and also has higher performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritized Experience Replay

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experienc...

متن کامل

Distributed Prioritized Experience Replay

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a s...

متن کامل

ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling

ViZDoom is a robust, first-person shooter reinforcement learning environment, characterized by a significant degree of latent state information. In this paper, double-Q learning and prioritized experience replay methods are tested under a certain ViZDoom combat scenario using a competitive deep recurrent Q-network (DRQN) architecture. In addition, an ensembling technique known as snapshot ensem...

متن کامل

Prioritized memory access explains planning and hippocampal replay

To make decisions, animals must evaluate outcomes of candidate choices by accessing memories of relevant experiences. Recent theories suggest that phenomena of habits and compulsion can be reinterpreted as selectively omitting such computations. Yet little is known about the more granular question of which specific experiences are considered or ignored during deliberation, which ultimately gove...

متن کامل

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as -greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017